Dataset statistics
| Number of variables | 16 |
|---|---|
| Number of observations | 18671 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.9 MiB |
| Average record size in memory | 104.0 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 7 |
survey_id has constant value "1476" | Constant |
city has constant value "Amsterdam" | Constant |
name has a high cardinality: 18150 distinct values | High cardinality |
last_modified has a high cardinality: 18671 distinct values | High cardinality |
location has a high cardinality: 18671 distinct values | High cardinality |
room_id is highly correlated with host_id | High correlation |
host_id is highly correlated with room_id | High correlation |
accommodates is highly correlated with bedrooms and 1 other fields | High correlation |
bedrooms is highly correlated with accommodates | High correlation |
price is highly correlated with accommodates | High correlation |
room_id is highly correlated with reviews | High correlation |
reviews is highly correlated with room_id and 1 other fields | High correlation |
overall_satisfaction is highly correlated with reviews | High correlation |
accommodates is highly correlated with bedrooms and 1 other fields | High correlation |
bedrooms is highly correlated with accommodates and 1 other fields | High correlation |
price is highly correlated with accommodates and 1 other fields | High correlation |
reviews is highly correlated with overall_satisfaction | High correlation |
overall_satisfaction is highly correlated with reviews | High correlation |
accommodates is highly correlated with bedrooms | High correlation |
bedrooms is highly correlated with accommodates | High correlation |
accommodates is highly correlated with bedrooms | High correlation |
latitude is highly correlated with longitude and 1 other fields | High correlation |
longitude is highly correlated with latitude and 1 other fields | High correlation |
host_id is highly correlated with room_id | High correlation |
neighborhood is highly correlated with latitude and 1 other fields | High correlation |
bedrooms is highly correlated with accommodates | High correlation |
room_id is highly correlated with host_id | High correlation |
survey_id is highly correlated with city and 2 other fields | High correlation |
city is highly correlated with survey_id and 2 other fields | High correlation |
neighborhood is highly correlated with survey_id and 1 other fields | High correlation |
room_type is highly correlated with survey_id and 1 other fields | High correlation |
name is uniformly distributed | Uniform |
last_modified is uniformly distributed | Uniform |
location is uniformly distributed | Uniform |
room_id has unique values | Unique |
last_modified has unique values | Unique |
location has unique values | Unique |
reviews has 2973 (15.9%) zeros | Zeros |
overall_satisfaction has 5725 (30.7%) zeros | Zeros |
bedrooms has 1148 (6.1%) zeros | Zeros |
Reproduction
| Analysis started | 2021-09-07 09:51:27.925443 |
|---|---|
| Analysis finished | 2021-09-07 09:51:51.429514 |
| Duration | 23.5 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 18671 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11210389.23 |
| Minimum | 2818 |
|---|---|
| Maximum | 20003728 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 145.9 KiB |
Quantile statistics
| Minimum | 2818 |
|---|---|
| 5-th percentile | 1018186 |
| Q1 | 6046700 |
| median | 12296977 |
| Q3 | 16624424.5 |
| 95-th percentile | 19578981.5 |
| Maximum | 20003728 |
| Range | 20000910 |
| Interquartile range (IQR) | 10577724.5 |
Descriptive statistics
| Standard deviation | 6087345.603 |
|---|---|
| Coefficient of variation (CV) | 0.5430092996 |
| Kurtosis | -1.227842256 |
| Mean | 11210389.23 |
| Median Absolute Deviation (MAD) | 5238856 |
| Skewness | -0.2557031374 |
| Sum | 2.093091773 × 1011 |
| Variance | 3.705577649 × 1013 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 19075070 | 1 | < 0.1% |
| 15901265 | 1 | < 0.1% |
| 11597089 | 1 | < 0.1% |
| 1333886 | 1 | < 0.1% |
| 13806208 | 1 | < 0.1% |
| 3521399 | 1 | < 0.1% |
| 11614850 | 1 | < 0.1% |
| 18264707 | 1 | < 0.1% |
| 4266630 | 1 | < 0.1% |
| 8281117 | 1 | < 0.1% |
| Other values (18661) | 18661 |
| Value | Count | Frequency (%) |
| 2818 | 1 | |
| 3209 | 1 | |
| 20168 | 1 | |
| 25428 | 1 | |
| 25488 | 1 | |
| 27886 | 1 | |
| 28658 | 1 | |
| 28871 | 1 | |
| 29051 | 1 | |
| 29554 | 1 |
| Value | Count | Frequency (%) |
| 20003728 | 1 | |
| 19996091 | 1 | |
| 19995673 | 1 | |
| 19995327 | 1 | |
| 19995246 | 1 | |
| 19995106 | 1 | |
| 19994262 | 1 | |
| 19992677 | 1 | |
| 19992596 | 1 | |
| 19992241 | 1 |
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 73.0 KiB |
| 1476 |
|---|
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Characters and Unicode
| Total characters | 74684 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1476 |
|---|---|
| 2nd row | 1476 |
| 3rd row | 1476 |
| 4th row | 1476 |
| 5th row | 1476 |
Common Values
| Value | Count | Frequency (%) |
| 1476 | 18671 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 1476 | 18671 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 18671 | |
| 4 | 18671 | |
| 7 | 18671 | |
| 6 | 18671 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 74684 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 18671 | |
| 4 | 18671 | |
| 7 | 18671 | |
| 6 | 18671 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 74684 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 18671 | |
| 4 | 18671 | |
| 7 | 18671 | |
| 6 | 18671 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 74684 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 18671 | |
| 4 | 18671 | |
| 7 | 18671 | |
| 6 | 18671 |
| Distinct | 15897 |
|---|---|
| Distinct (%) | 85.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35792660.12 |
| Minimum | 2234 |
|---|---|
| Maximum | 141831915 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 145.9 KiB |
Quantile statistics
| Minimum | 2234 |
|---|---|
| 5-th percentile | 1477248 |
| Q1 | 7126211.5 |
| median | 19884429 |
| Q3 | 52033129 |
| 95-th percentile | 121991189 |
| Maximum | 141831915 |
| Range | 141829681 |
| Interquartile range (IQR) | 44906917.5 |
Descriptive statistics
| Standard deviation | 37613303.54 |
|---|---|
| Coefficient of variation (CV) | 1.05086639 |
| Kurtosis | 0.4850144549 |
| Mean | 35792660.12 |
| Median Absolute Deviation (MAD) | 15789616 |
| Skewness | 1.242943803 |
| Sum | 6.682847572 × 1011 |
| Variance | 1.414760603 × 1015 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 48703385 | 93 | 0.5% |
| 113977564 | 88 | 0.5% |
| 1464510 | 71 | 0.4% |
| 107745142 | 64 | 0.3% |
| 84453740 | 61 | 0.3% |
| 65859990 | 54 | 0.3% |
| 517215 | 52 | 0.3% |
| 46691672 | 43 | 0.2% |
| 84449589 | 37 | 0.2% |
| 669178 | 36 | 0.2% |
| Other values (15887) | 18072 |
| Value | Count | Frequency (%) |
| 2234 | 1 | |
| 3159 | 1 | |
| 3806 | 1 | |
| 5988 | 2 | |
| 7924 | 1 | |
| 12085 | 1 | |
| 20405 | 1 | |
| 34080 | 1 | |
| 36701 | 1 | |
| 40786 | 1 |
| Value | Count | Frequency (%) |
| 141831915 | 1 | < 0.1% |
| 141749109 | 1 | < 0.1% |
| 141747815 | 1 | < 0.1% |
| 141665148 | 4 | |
| 141658022 | 1 | < 0.1% |
| 141648682 | 1 | < 0.1% |
| 141551211 | 1 | < 0.1% |
| 141548705 | 1 | < 0.1% |
| 141542351 | 1 | < 0.1% |
| 141534602 | 1 | < 0.1% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 73.0 KiB |
| Entire home/apt | |
|---|---|
| Private room | |
| Shared room | 63 |
Length
| Max length | 15 |
|---|---|
| Median length | 15 |
| Mean length | 14.39665792 |
| Min length | 11 |
Characters and Unicode
| Total characters | 268800 |
|---|---|
| Distinct characters | 17 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Shared room |
|---|---|
| 2nd row | Shared room |
| 3rd row | Shared room |
| 4th row | Shared room |
| 5th row | Shared room |
Common Values
| Value | Count | Frequency (%) |
| Entire home/apt | 14937 | |
| Private room | 3671 | 19.7% |
| Shared room | 63 | 0.3% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| home/apt | 14937 | |
| entire | 14937 | |
| room | 3734 | 10.0% |
| private | 3671 | 9.8% |
| shared | 63 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 33608 | |
| t | 33545 | |
| r | 22405 | |
| o | 22405 | |
| a | 18671 | 6.9% |
| 18671 | 6.9% | |
| m | 18671 | 6.9% |
| i | 18608 | 6.9% |
| h | 15000 | 5.6% |
| E | 14937 | 5.6% |
| Other values (7) | 52279 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 216521 | |
| Uppercase Letter | 18671 | 6.9% |
| Space Separator | 18671 | 6.9% |
| Other Punctuation | 14937 | 5.6% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 33608 | |
| t | 33545 | |
| r | 22405 | |
| o | 22405 | |
| a | 18671 | |
| m | 18671 | |
| i | 18608 | |
| h | 15000 | |
| n | 14937 | |
| p | 14937 | |
| Other values (2) | 3734 | 1.7% |
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 14937 | |
| P | 3671 | 19.7% |
| S | 63 | 0.3% |
Space Separator
| Value | Count | Frequency (%) |
| 18671 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 14937 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 235192 | |
| Common | 33608 | 12.5% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 33608 | |
| t | 33545 | |
| r | 22405 | |
| o | 22405 | |
| a | 18671 | |
| m | 18671 | |
| i | 18608 | |
| h | 15000 | |
| E | 14937 | |
| n | 14937 | |
| Other values (5) | 22405 |
Common
| Value | Count | Frequency (%) |
| 18671 | ||
| / | 14937 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 268800 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 33608 | |
| t | 33545 | |
| r | 22405 | |
| o | 22405 | |
| a | 18671 | 6.9% |
| 18671 | 6.9% | |
| m | 18671 | 6.9% |
| i | 18608 | 6.9% |
| h | 15000 | 5.6% |
| E | 14937 | 5.6% |
| Other values (7) | 52279 |
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 73.0 KiB |
| Amsterdam |
|---|
Length
| Max length | 9 |
|---|---|
| Median length | 9 |
| Mean length | 9 |
| Min length | 9 |
Characters and Unicode
| Total characters | 168039 |
|---|---|
| Distinct characters | 8 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Amsterdam |
|---|---|
| 2nd row | Amsterdam |
| 3rd row | Amsterdam |
| 4th row | Amsterdam |
| 5th row | Amsterdam |
Common Values
| Value | Count | Frequency (%) |
| Amsterdam | 18671 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| amsterdam | 18671 |
Most occurring characters
| Value | Count | Frequency (%) |
| m | 37342 | |
| A | 18671 | |
| s | 18671 | |
| t | 18671 | |
| e | 18671 | |
| r | 18671 | |
| d | 18671 | |
| a | 18671 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 149368 | |
| Uppercase Letter | 18671 | 11.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| m | 37342 | |
| s | 18671 | |
| t | 18671 | |
| e | 18671 | |
| r | 18671 | |
| d | 18671 | |
| a | 18671 |
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 18671 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 168039 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| m | 37342 | |
| A | 18671 | |
| s | 18671 | |
| t | 18671 | |
| e | 18671 | |
| r | 18671 | |
| d | 18671 | |
| a | 18671 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 168039 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| m | 37342 | |
| A | 18671 | |
| s | 18671 | |
| t | 18671 | |
| e | 18671 | |
| r | 18671 | |
| d | 18671 | |
| a | 18671 |
| Distinct | 23 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 73.0 KiB |
| De Baarsjes / Oud West | |
|---|---|
| De Pijp / Rivierenbuurt | |
| Centrum West | |
| Centrum Oost | |
| Westerpark | |
| Other values (18) |
Length
| Max length | 38 |
|---|---|
| Median length | 15 |
| Mean length | 17.5162016 |
| Min length | 6 |
Characters and Unicode
| Total characters | 327045 |
|---|---|
| Distinct characters | 43 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | De Pijp / Rivierenbuurt |
|---|---|
| 2nd row | Centrum West |
| 3rd row | Watergraafsmeer |
| 4th row | Centrum West |
| 5th row | De Baarsjes / Oud West |
Common Values
| Value | Count | Frequency (%) |
| De Baarsjes / Oud West | 3276 | |
| De Pijp / Rivierenbuurt | 2371 | |
| Centrum West | 2216 | |
| Centrum Oost | 1727 | |
| Westerpark | 1428 | |
| Noord-West / Noord-Midden | 1415 | |
| Oud Oost | 1166 | 6.2% |
| Bos en Lommer | 983 | 5.3% |
| Oostelijk Havengebied / Indische Buurt | 920 | 4.9% |
| Watergraafsmeer | 514 | 2.8% |
| Other values (13) | 2655 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 8960 | ||
| de | 5761 | |
| west | 5732 | |
| oud | 4936 | 8.8% |
| centrum | 4042 | 7.2% |
| baarsjes | 3276 | 5.8% |
| oost | 3211 | 5.7% |
| pijp | 2371 | 4.2% |
| rivierenbuurt | 2371 | 4.2% |
| westerpark | 1428 | 2.5% |
| Other values (27) | 14097 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 41006 | 12.5% |
| 37514 | 11.5% | |
| r | 24808 | 7.6% |
| s | 22145 | 6.8% |
| t | 22088 | 6.8% |
| u | 17123 | 5.2% |
| d | 14710 | 4.5% |
| o | 14559 | 4.5% |
| i | 12517 | 3.8% |
| a | 11891 | 3.6% |
| Other values (33) | 108684 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 228669 | |
| Uppercase Letter | 49072 | 15.0% |
| Space Separator | 37514 | 11.5% |
| Other Punctuation | 8960 | 2.7% |
| Dash Punctuation | 2830 | 0.9% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 41006 | |
| r | 24808 | |
| s | 22145 | |
| t | 22088 | |
| u | 17123 | |
| d | 14710 | 6.4% |
| o | 14559 | 6.4% |
| i | 12517 | 5.5% |
| a | 11891 | 5.2% |
| n | 11629 | 5.1% |
| Other values (13) | 36193 |
Uppercase Letter
| Value | Count | Frequency (%) |
| O | 9230 | |
| W | 9104 | |
| D | 5803 | |
| B | 5625 | |
| C | 4042 | |
| N | 3899 | |
| P | 2371 | 4.8% |
| R | 2371 | 4.8% |
| M | 1415 | 2.9% |
| I | 1297 | 2.6% |
| Other values (7) | 3915 |
Space Separator
| Value | Count | Frequency (%) |
| 37514 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 8960 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 2830 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 277741 | |
| Common | 49304 | 15.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 41006 | |
| r | 24808 | 8.9% |
| s | 22145 | 8.0% |
| t | 22088 | 8.0% |
| u | 17123 | 6.2% |
| d | 14710 | 5.3% |
| o | 14559 | 5.2% |
| i | 12517 | 4.5% |
| a | 11891 | 4.3% |
| n | 11629 | 4.2% |
| Other values (30) | 85265 |
Common
| Value | Count | Frequency (%) |
| 37514 | ||
| / | 8960 | 18.2% |
| - | 2830 | 5.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 327045 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 41006 | 12.5% |
| 37514 | 11.5% | |
| r | 24808 | 7.6% |
| s | 22145 | 6.8% |
| t | 22088 | 6.8% |
| u | 17123 | 5.2% |
| d | 14710 | 4.5% |
| o | 14559 | 4.5% |
| i | 12517 | 3.8% |
| a | 11891 | 3.6% |
| Other values (33) | 108684 |
| Distinct | 283 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16.75721707 |
| Minimum | 0 |
|---|---|
| Maximum | 532 |
| Zeros | 2973 |
| Zeros (%) | 15.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 145.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 6 |
| Q3 | 17 |
| 95-th percentile | 67 |
| Maximum | 532 |
| Range | 532 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 33.52959861 |
|---|---|
| Coefficient of variation (CV) | 2.000904951 |
| Kurtosis | 43.77759527 |
| Mean | 16.75721707 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | 5.502837236 |
| Sum | 312874 |
| Variance | 1124.233983 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 2973 | 15.9% |
| 1 | 1504 | 8.1% |
| 2 | 1240 | 6.6% |
| 3 | 1099 | 5.9% |
| 4 | 924 | 4.9% |
| 5 | 874 | 4.7% |
| 6 | 736 | 3.9% |
| 7 | 682 | 3.7% |
| 8 | 588 | 3.1% |
| 9 | 527 | 2.8% |
| Other values (273) | 7524 |
| Value | Count | Frequency (%) |
| 0 | 2973 | |
| 1 | 1504 | |
| 2 | 1240 | |
| 3 | 1099 | 5.9% |
| 4 | 924 | 4.9% |
| 5 | 874 | 4.7% |
| 6 | 736 | 3.9% |
| 7 | 682 | 3.7% |
| 8 | 588 | 3.1% |
| 9 | 527 | 2.8% |
| Value | Count | Frequency (%) |
| 532 | 1 | |
| 465 | 1 | |
| 463 | 1 | |
| 452 | 1 | |
| 447 | 1 | |
| 443 | 2 | |
| 433 | 1 | |
| 430 | 2 | |
| 425 | 1 | |
| 410 | 2 |
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.302956457 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 5725 |
| Zeros (%) | 30.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 145.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 4.5 |
| Q3 | 5 |
| 95-th percentile | 5 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 2.212851115 |
|---|---|
| Coefficient of variation (CV) | 0.669960729 |
| Kurtosis | -1.314098148 |
| Mean | 3.302956457 |
| Median Absolute Deviation (MAD) | 0.5 |
| Skewness | -0.7945374407 |
| Sum | 61669.5 |
| Variance | 4.896710059 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=9)
| Value | Count | Frequency (%) |
| 5 | 7693 | |
| 0 | 5725 | |
| 4.5 | 4546 | |
| 4 | 576 | 3.1% |
| 3.5 | 109 | 0.6% |
| 3 | 19 | 0.1% |
| 1 | 1 | < 0.1% |
| 2.5 | 1 | < 0.1% |
| 1.5 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 5725 | |
| 1 | 1 | < 0.1% |
| 1.5 | 1 | < 0.1% |
| 2.5 | 1 | < 0.1% |
| 3 | 19 | 0.1% |
| 3.5 | 109 | 0.6% |
| 4 | 576 | 3.1% |
| 4.5 | 4546 | |
| 5 | 7693 |
| Value | Count | Frequency (%) |
| 5 | 7693 | |
| 4.5 | 4546 | |
| 4 | 576 | 3.1% |
| 3.5 | 109 | 0.6% |
| 3 | 19 | 0.1% |
| 2.5 | 1 | < 0.1% |
| 1.5 | 1 | < 0.1% |
| 1 | 1 | < 0.1% |
| 0 | 5725 |
| Distinct | 16 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.922875047 |
| Minimum | 1 |
|---|---|
| Maximum | 17 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 145.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 2 |
| median | 2 |
| Q3 | 4 |
| 95-th percentile | 5 |
| Maximum | 17 |
| Range | 16 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.327895671 |
|---|---|
| Coefficient of variation (CV) | 0.4543114741 |
| Kurtosis | 14.35766347 |
| Mean | 2.922875047 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.390197061 |
| Sum | 54573 |
| Variance | 1.763306913 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=16)
| Value | Count | Frequency (%) |
| 2 | 9991 | |
| 4 | 5571 | |
| 3 | 1579 | 8.5% |
| 6 | 473 | 2.5% |
| 5 | 471 | 2.5% |
| 1 | 365 | 2.0% |
| 8 | 105 | 0.6% |
| 7 | 52 | 0.3% |
| 16 | 20 | 0.1% |
| 10 | 16 | 0.1% |
| Other values (6) | 28 | 0.1% |
| Value | Count | Frequency (%) |
| 1 | 365 | 2.0% |
| 2 | 9991 | |
| 3 | 1579 | 8.5% |
| 4 | 5571 | |
| 5 | 471 | 2.5% |
| 6 | 473 | 2.5% |
| 7 | 52 | 0.3% |
| 8 | 105 | 0.6% |
| 9 | 8 | < 0.1% |
| 10 | 16 | 0.1% |
| Value | Count | Frequency (%) |
| 17 | 1 | < 0.1% |
| 16 | 20 | 0.1% |
| 14 | 6 | < 0.1% |
| 13 | 1 | < 0.1% |
| 12 | 10 | 0.1% |
| 11 | 2 | < 0.1% |
| 10 | 16 | 0.1% |
| 9 | 8 | < 0.1% |
| 8 | 105 | |
| 7 | 52 |
| Distinct | 11 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.430989235 |
| Minimum | 0 |
|---|---|
| Maximum | 10 |
| Zeros | 1148 |
| Zeros (%) | 6.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 145.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 3 |
| Maximum | 10 |
| Range | 10 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.8792321975 |
|---|---|
| Coefficient of variation (CV) | 0.6144226498 |
| Kurtosis | 5.629498729 |
| Mean | 1.430989235 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.602014005 |
| Sum | 26718 |
| Variance | 0.773049257 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=11)
| Value | Count | Frequency (%) |
| 1 | 11068 | |
| 2 | 4446 | |
| 3 | 1442 | 7.7% |
| 0 | 1148 | 6.1% |
| 4 | 472 | 2.5% |
| 5 | 62 | 0.3% |
| 6 | 19 | 0.1% |
| 10 | 5 | < 0.1% |
| 7 | 4 | < 0.1% |
| 8 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 1148 | 6.1% |
| 1 | 11068 | |
| 2 | 4446 | |
| 3 | 1442 | 7.7% |
| 4 | 472 | 2.5% |
| 5 | 62 | 0.3% |
| 6 | 19 | 0.1% |
| 7 | 4 | < 0.1% |
| 8 | 3 | < 0.1% |
| 9 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 10 | 5 | < 0.1% |
| 9 | 2 | < 0.1% |
| 8 | 3 | < 0.1% |
| 7 | 4 | < 0.1% |
| 6 | 19 | 0.1% |
| 5 | 62 | 0.3% |
| 4 | 472 | 2.5% |
| 3 | 1442 | 7.7% |
| 2 | 4446 | |
| 1 | 11068 |
| Distinct | 423 |
|---|---|
| Distinct (%) | 2.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 166.6377805 |
| Minimum | 12 |
|---|---|
| Maximum | 6000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 145.9 KiB |
Quantile statistics
| Minimum | 12 |
|---|---|
| 5-th percentile | 72 |
| Q1 | 108 |
| median | 144 |
| Q3 | 192 |
| 95-th percentile | 330 |
| Maximum | 6000 |
| Range | 5988 |
| Interquartile range (IQR) | 84 |
Descriptive statistics
| Standard deviation | 108.9759641 |
|---|---|
| Coefficient of variation (CV) | 0.6539691286 |
| Kurtosis | 522.6742749 |
| Mean | 166.6377805 |
| Median Absolute Deviation (MAD) | 36 |
| Skewness | 12.78844196 |
| Sum | 3111294 |
| Variance | 11875.76076 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 119 | 1016 | 5.4% |
| 180 | 998 | 5.3% |
| 144 | 884 | 4.7% |
| 150 | 619 | 3.3% |
| 132 | 588 | 3.1% |
| 108 | 560 | 3.0% |
| 96 | 517 | 2.8% |
| 118 | 508 | 2.7% |
| 114 | 507 | 2.7% |
| 240 | 492 | 2.6% |
| Other values (413) | 11982 |
| Value | Count | Frequency (%) |
| 12 | 1 | < 0.1% |
| 18 | 1 | < 0.1% |
| 21 | 1 | < 0.1% |
| 22 | 1 | < 0.1% |
| 23 | 1 | < 0.1% |
| 24 | 6 | |
| 25 | 1 | < 0.1% |
| 28 | 1 | < 0.1% |
| 29 | 2 | < 0.1% |
| 30 | 6 |
| Value | Count | Frequency (%) |
| 6000 | 1 | |
| 3770 | 1 | |
| 1920 | 1 | |
| 1799 | 1 | |
| 1558 | 1 | |
| 1428 | 1 | |
| 1412 | 1 | |
| 1386 | 1 | |
| 1343 | 1 | |
| 1319 | 1 |
| Distinct | 18150 |
|---|---|
| Distinct (%) | 97.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 73.0 KiB |
| Amsterdam | 36 |
|---|---|
| Lovely apartment near Vondelpark | 10 |
| Spacious family house with garden | 8 |
| Beautiful apartment in Amsterdam | 8 |
| Magnificent panoramic city view | 8 |
| Other values (18145) |
Length
| Max length | 78 |
|---|---|
| Median length | 35 |
| Mean length | 36.09233571 |
| Min length | 1 |
Characters and Unicode
| Total characters | 673880 |
|---|---|
| Distinct characters | 157 |
| Distinct categories | 20 ? |
| Distinct scripts | 4 ? |
| Distinct blocks | 10 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 17814 ? |
|---|---|
| Unique (%) | 95.4% |
Sample
| 1st row | Red Light/ Canal view apartment (Shared) |
|---|---|
| 2nd row | Sunny and Cozy Living room in quite neighbours |
| 3rd row | Amsterdam |
| 4th row | Canal boat RIDE in Amsterdam |
| 5th row | One room for rent in a three room appartment |
Common Values
| Value | Count | Frequency (%) |
| Amsterdam | 36 | 0.2% |
| Lovely apartment near Vondelpark | 10 | 0.1% |
| Spacious family house with garden | 8 | < 0.1% |
| Beautiful apartment in Amsterdam | 8 | < 0.1% |
| Magnificent panoramic city view | 8 | < 0.1% |
| Cosy apartment in Amsterdam | 8 | < 0.1% |
| Lovely apartment in Amsterdam | 7 | < 0.1% |
| Nice comfy room, magnificent view | 7 | < 0.1% |
| Spacious apartment near Vondelpark | 7 | < 0.1% |
| Cosy apartment near Vondelpark | 6 | < 0.1% |
| Other values (18140) | 18566 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| apartment | 7118 | 6.7% |
| in | 5730 | 5.4% |
| amsterdam | 3588 | 3.4% |
| 3195 | 3.0% | |
| with | 2669 | 2.5% |
| the | 2165 | 2.0% |
| spacious | 2082 | 2.0% |
| city | 1906 | 1.8% |
| centre | 1768 | 1.7% |
| room | 1728 | 1.6% |
| Other values (4867) | 73723 |
Most occurring characters
| Value | Count | Frequency (%) |
| 87491 | 13.0% | |
| e | 59230 | 8.8% |
| t | 55217 | 8.2% |
| a | 52626 | 7.8% |
| r | 42831 | 6.4% |
| n | 39759 | 5.9% |
| o | 35472 | 5.3% |
| i | 32482 | 4.8% |
| m | 26379 | 3.9% |
| s | 21195 | 3.1% |
| Other values (147) | 221198 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 510398 | |
| Space Separator | 87492 | 13.0% |
| Uppercase Letter | 54936 | 8.2% |
| Other Punctuation | 11184 | 1.7% |
| Decimal Number | 5572 | 0.8% |
| Dash Punctuation | 1595 | 0.2% |
| Math Symbol | 1136 | 0.2% |
| Close Punctuation | 621 | 0.1% |
| Open Punctuation | 588 | 0.1% |
| Other Symbol | 236 | < 0.1% |
| Other values (10) | 122 | < 0.1% |
Most frequent character per category
Other Letter
| Value | Count | Frequency (%) |
| 阿 | 2 | 5.1% |
| 姆 | 2 | 5.1% |
| 斯 | 2 | 5.1% |
| 特 | 2 | 5.1% |
| 丹 | 2 | 5.1% |
| 公 | 2 | 5.1% |
| 到 | 2 | 5.1% |
| 獨 | 1 | 2.6% |
| 立 | 1 | 2.6% |
| 寓 | 1 | 2.6% |
| Other values (22) | 22 |
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 59230 | |
| t | 55217 | |
| a | 52626 | |
| r | 42831 | 8.4% |
| n | 39759 | 7.8% |
| o | 35472 | 6.9% |
| i | 32482 | 6.4% |
| m | 26379 | 5.2% |
| s | 21195 | 4.2% |
| p | 19825 | 3.9% |
| Other values (20) | 125382 |
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 8892 | |
| C | 6863 | |
| S | 4399 | 8.0% |
| L | 3283 | 6.0% |
| B | 3251 | 5.9% |
| R | 2791 | 5.1% |
| P | 2694 | 4.9% |
| E | 2341 | 4.3% |
| T | 2219 | 4.0% |
| N | 2194 | 4.0% |
| Other values (17) | 16009 |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 2817 | |
| ! | 2756 | |
| & | 1686 | |
| . | 1473 | |
| ' | 831 | 7.4% |
| / | 587 | 5.2% |
| @ | 315 | 2.8% |
| " | 285 | 2.5% |
| : | 189 | 1.7% |
| * | 154 | 1.4% |
| Other values (7) | 91 | 0.8% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 1885 | |
| 1 | 992 | |
| 0 | 741 | 13.3% |
| 5 | 498 | 8.9% |
| 3 | 463 | 8.3% |
| 4 | 412 | 7.4% |
| 8 | 150 | 2.7% |
| 6 | 150 | 2.7% |
| 9 | 145 | 2.6% |
| 7 | 136 | 2.4% |
Other Symbol
| Value | Count | Frequency (%) |
| ★ | 171 | |
| ☆ | 33 | 14.0% |
| ❤ | 14 | 5.9% |
| ♡ | 5 | 2.1% |
| ♥ | 5 | 2.1% |
| ⭐ | 3 | 1.3% |
| ° | 3 | 1.3% |
| ☕ | 1 | 0.4% |
| ☺ | 1 | 0.4% |
Math Symbol
| Value | Count | Frequency (%) |
| + | 660 | |
| | | 460 | |
| < | 5 | 0.4% |
| > | 4 | 0.4% |
| = | 3 | 0.3% |
| ~ | 2 | 0.2% |
| ⊕ | 1 | 0.1% |
| ÷ | 1 | 0.1% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 581 | |
| [ | 6 | 1.0% |
| 【 | 1 | 0.2% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 614 | |
| ] | 6 | 1.0% |
| 】 | 1 | 0.2% |
Space Separator
| Value | Count | Frequency (%) |
| 87491 | ||
| 1 | < 0.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 1593 | |
| – | 2 | 0.1% |
Nonspacing Mark
| Value | Count | Frequency (%) |
| ️ | 15 | |
| ︎ | 1 | 6.2% |
Control
| Value | Count | Frequency (%) |
| 6 | ||
| 6 |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 9 | |
| ” | 2 | 18.2% |
Initial Punctuation
| Value | Count | Frequency (%) |
| ‘ | 3 | |
| “ | 2 |
Currency Symbol
| Value | Count | Frequency (%) |
| € | 4 | |
| $ | 1 | 20.0% |
Other Number
| Value | Count | Frequency (%) |
| ² | 22 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 7 |
Modifier Symbol
| Value | Count | Frequency (%) |
| ´ | 4 |
Format
| Value | Count | Frequency (%) |
| | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 565334 | |
| Common | 108491 | 16.1% |
| Han | 39 | < 0.1% |
| Inherited | 16 | < 0.1% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 87491 | ||
| , | 2817 | 2.6% |
| ! | 2756 | 2.5% |
| 2 | 1885 | 1.7% |
| & | 1686 | 1.6% |
| - | 1593 | 1.5% |
| . | 1473 | 1.4% |
| 1 | 992 | 0.9% |
| ' | 831 | 0.8% |
| 0 | 741 | 0.7% |
| Other values (56) | 6226 | 5.7% |
Latin
| Value | Count | Frequency (%) |
| e | 59230 | 10.5% |
| t | 55217 | 9.8% |
| a | 52626 | 9.3% |
| r | 42831 | 7.6% |
| n | 39759 | 7.0% |
| o | 35472 | 6.3% |
| i | 32482 | 5.7% |
| m | 26379 | 4.7% |
| s | 21195 | 3.7% |
| p | 19825 | 3.5% |
| Other values (47) | 180318 |
Han
| Value | Count | Frequency (%) |
| 阿 | 2 | 5.1% |
| 姆 | 2 | 5.1% |
| 斯 | 2 | 5.1% |
| 特 | 2 | 5.1% |
| 丹 | 2 | 5.1% |
| 公 | 2 | 5.1% |
| 到 | 2 | 5.1% |
| 獨 | 1 | 2.6% |
| 立 | 1 | 2.6% |
| 寓 | 1 | 2.6% |
| Other values (22) | 22 |
Inherited
| Value | Count | Frequency (%) |
| ️ | 15 | |
| ︎ | 1 | 6.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 673492 | |
| Misc Symbols | 216 | < 0.1% |
| Latin 1 Sup | 57 | < 0.1% |
| CJK | 39 | < 0.1% |
| Punctuation | 34 | < 0.1% |
| VS | 16 | < 0.1% |
| Dingbats | 14 | < 0.1% |
| None | 7 | < 0.1% |
| Currency Symbols | 4 | < 0.1% |
| Math Operators | 1 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 87491 | 13.0% | |
| e | 59230 | 8.8% |
| t | 55217 | 8.2% |
| a | 52626 | 7.8% |
| r | 42831 | 6.4% |
| n | 39759 | 5.9% |
| o | 35472 | 5.3% |
| i | 32482 | 4.8% |
| m | 26379 | 3.9% |
| s | 21195 | 3.1% |
| Other values (82) | 220810 |
Latin 1 Sup
| Value | Count | Frequency (%) |
| ² | 22 | |
| é | 15 | |
| à | 4 | 7.0% |
| ´ | 4 | 7.0% |
| ° | 3 | 5.3% |
| É | 3 | 5.3% |
| á | 2 | 3.5% |
| ¡ | 1 | 1.8% |
| 1 | 1.8% | |
| ÷ | 1 | 1.8% |
Misc Symbols
| Value | Count | Frequency (%) |
| ★ | 171 | |
| ☆ | 33 | 15.3% |
| ♡ | 5 | 2.3% |
| ♥ | 5 | 2.3% |
| ☕ | 1 | 0.5% |
| ☺ | 1 | 0.5% |
None
| Value | Count | Frequency (%) |
| ⭐ | 3 | |
| , | 2 | |
| 【 | 1 | 14.3% |
| 】 | 1 | 14.3% |
VS
| Value | Count | Frequency (%) |
| ️ | 15 | |
| ︎ | 1 | 6.2% |
Dingbats
| Value | Count | Frequency (%) |
| ❤ | 14 |
Punctuation
| Value | Count | Frequency (%) |
| • | 15 | |
| ’ | 9 | |
| ‘ | 3 | 8.8% |
| “ | 2 | 5.9% |
| ” | 2 | 5.9% |
| – | 2 | 5.9% |
| | 1 | 2.9% |
Math Operators
| Value | Count | Frequency (%) |
| ⊕ | 1 |
Currency Symbols
| Value | Count | Frequency (%) |
| € | 4 |
CJK
| Value | Count | Frequency (%) |
| 阿 | 2 | 5.1% |
| 姆 | 2 | 5.1% |
| 斯 | 2 | 5.1% |
| 特 | 2 | 5.1% |
| 丹 | 2 | 5.1% |
| 公 | 2 | 5.1% |
| 到 | 2 | 5.1% |
| 獨 | 1 | 2.6% |
| 立 | 1 | 2.6% |
| 寓 | 1 | 2.6% |
| Other values (22) | 22 |
| Distinct | 18671 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 73.0 KiB |
| 2017-07-22 23:13:49.922770 | 1 |
|---|---|
| 2017-07-22 16:09:30.267337 | 1 |
| 2017-07-23 13:06:01.692221 | 1 |
| 2017-07-23 06:03:12.052431 | 1 |
| 2017-07-23 05:53:53.038499 | 1 |
| Other values (18666) |
Length
| Max length | 26 |
|---|---|
| Median length | 26 |
| Mean length | 26 |
| Min length | 26 |
Characters and Unicode
| Total characters | 485446 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 4 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 18671 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | 2017-07-23 13:06:27.391699 |
|---|---|
| 2nd row | 2017-07-23 13:06:23.607187 |
| 3rd row | 2017-07-23 13:06:23.603546 |
| 4th row | 2017-07-23 13:06:22.689787 |
| 5th row | 2017-07-23 13:06:19.681469 |
Common Values
| Value | Count | Frequency (%) |
| 2017-07-22 23:13:49.922770 | 1 | < 0.1% |
| 2017-07-22 16:09:30.267337 | 1 | < 0.1% |
| 2017-07-23 13:06:01.692221 | 1 | < 0.1% |
| 2017-07-23 06:03:12.052431 | 1 | < 0.1% |
| 2017-07-23 05:53:53.038499 | 1 | < 0.1% |
| 2017-07-22 22:48:25.121502 | 1 | < 0.1% |
| 2017-07-23 05:56:43.570859 | 1 | < 0.1% |
| 2017-07-22 17:36:18.383894 | 1 | < 0.1% |
| 2017-07-23 03:12:42.530410 | 1 | < 0.1% |
| 2017-07-23 03:30:30.634173 | 1 | < 0.1% |
| Other values (18661) | 18661 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 2017-07-22 | 13653 | |
| 2017-07-23 | 5018 | 13.4% |
| 22:44:06.254758 | 1 | < 0.1% |
| 16:33:22.354714 | 1 | < 0.1% |
| 16:07:11.917813 | 1 | < 0.1% |
| 18:27:03.767369 | 1 | < 0.1% |
| 20:01:57.374192 | 1 | < 0.1% |
| 16:05:46.725142 | 1 | < 0.1% |
| 18:04:31.237361 | 1 | < 0.1% |
| 20:28:39.850183 | 1 | < 0.1% |
| Other values (18663) | 18663 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 81579 | |
| 0 | 65245 | |
| 7 | 55414 | |
| 1 | 48635 | |
| - | 37342 | |
| : | 37342 | |
| 3 | 29560 | 6.1% |
| 5 | 22440 | 4.6% |
| 4 | 19585 | 4.0% |
| 6 | 19230 | 4.0% |
| Other values (4) | 69074 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 373420 | |
| Other Punctuation | 56013 | 11.5% |
| Dash Punctuation | 37342 | 7.7% |
| Space Separator | 18671 | 3.8% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 81579 | |
| 0 | 65245 | |
| 7 | 55414 | |
| 1 | 48635 | |
| 3 | 29560 | 7.9% |
| 5 | 22440 | 6.0% |
| 4 | 19585 | 5.2% |
| 6 | 19230 | 5.1% |
| 8 | 16209 | 4.3% |
| 9 | 15523 | 4.2% |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 37342 | |
| . | 18671 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 37342 |
Space Separator
| Value | Count | Frequency (%) |
| 18671 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 485446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 81579 | |
| 0 | 65245 | |
| 7 | 55414 | |
| 1 | 48635 | |
| - | 37342 | |
| : | 37342 | |
| 3 | 29560 | 6.1% |
| 5 | 22440 | 4.6% |
| 4 | 19585 | 4.0% |
| 6 | 19230 | 4.0% |
| Other values (4) | 69074 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 485446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 81579 | |
| 0 | 65245 | |
| 7 | 55414 | |
| 1 | 48635 | |
| - | 37342 | |
| : | 37342 | |
| 3 | 29560 | 6.1% |
| 5 | 22440 | 4.6% |
| 4 | 19585 | 4.0% |
| 6 | 19230 | 4.0% |
| Other values (4) | 69074 |
| Distinct | 15560 |
|---|---|
| Distinct (%) | 83.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 52.36525919 |
| Minimum | 52.2962 |
|---|---|
| Maximum | 52.42498 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 145.9 KiB |
Quantile statistics
| Minimum | 52.2962 |
|---|---|
| 5-th percentile | 52.343284 |
| Q1 | 52.355253 |
| median | 52.364623 |
| Q3 | 52.3747995 |
| 95-th percentile | 52.3893735 |
| Maximum | 52.42498 |
| Range | 0.12878 |
| Interquartile range (IQR) | 0.0195465 |
Descriptive statistics
| Standard deviation | 0.01515023385 |
|---|---|
| Coefficient of variation (CV) | 0.0002893184162 |
| Kurtosis | 1.417392356 |
| Mean | 52.36525919 |
| Median Absolute Deviation (MAD) | 0.009737 |
| Skewness | 0.007665258349 |
| Sum | 977711.7543 |
| Variance | 0.0002295295858 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 52.354646 | 5 | < 0.1% |
| 52.366852 | 5 | < 0.1% |
| 52.361364 | 5 | < 0.1% |
| 52.362453 | 4 | < 0.1% |
| 52.379011 | 4 | < 0.1% |
| 52.354748 | 4 | < 0.1% |
| 52.37004 | 4 | < 0.1% |
| 52.363217 | 4 | < 0.1% |
| 52.369917 | 4 | < 0.1% |
| 52.355191 | 4 | < 0.1% |
| Other values (15550) | 18628 |
| Value | Count | Frequency (%) |
| 52.2962 | 1 | |
| 52.297203 | 1 | |
| 52.299763 | 1 | |
| 52.299846 | 1 | |
| 52.299875 | 1 | |
| 52.300105 | 1 | |
| 52.30013 | 1 | |
| 52.300915 | 1 | |
| 52.301257 | 1 | |
| 52.301683 | 1 |
| Value | Count | Frequency (%) |
| 52.42498 | 1 | |
| 52.424641 | 1 | |
| 52.424255 | 1 | |
| 52.423647 | 1 | |
| 52.423498 | 1 | |
| 52.423432 | 1 | |
| 52.423321 | 1 | |
| 52.422827 | 1 | |
| 52.422232 | 1 | |
| 52.422228 | 1 |
| Distinct | 17112 |
|---|---|
| Distinct (%) | 91.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.888602383 |
| Minimum | 4.763264 |
|---|---|
| Maximum | 5.027689 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 145.9 KiB |
Quantile statistics
| Minimum | 4.763264 |
|---|---|
| 5-th percentile | 4.845295 |
| Q1 | 4.864383 |
| median | 4.886012 |
| Q3 | 4.907499 |
| 95-th percentile | 4.9445575 |
| Maximum | 5.027689 |
| Range | 0.264425 |
| Interquartile range (IQR) | 0.043116 |
Descriptive statistics
| Standard deviation | 0.03455214945 |
|---|---|
| Coefficient of variation (CV) | 0.007067899318 |
| Kurtosis | 1.217272119 |
| Mean | 4.888602383 |
| Median Absolute Deviation (MAD) | 0.021551 |
| Skewness | 0.5376597501 |
| Sum | 91275.0951 |
| Variance | 0.001193851032 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 4.907187 | 5 | < 0.1% |
| 4.86301 | 4 | < 0.1% |
| 4.888738 | 4 | < 0.1% |
| 4.861512 | 4 | < 0.1% |
| 4.877004 | 4 | < 0.1% |
| 4.904646 | 4 | < 0.1% |
| 4.856525 | 4 | < 0.1% |
| 4.893506 | 4 | < 0.1% |
| 4.893017 | 4 | < 0.1% |
| 4.891267 | 4 | < 0.1% |
| Other values (17102) | 18630 |
| Value | Count | Frequency (%) |
| 4.763264 | 1 | |
| 4.768452 | 1 | |
| 4.769151 | 1 | |
| 4.771083 | 1 | |
| 4.772725 | 1 | |
| 4.772822 | 1 | |
| 4.775168 | 1 | |
| 4.775748 | 1 | |
| 4.77647 | 1 | |
| 4.77764 | 1 |
| Value | Count | Frequency (%) |
| 5.027689 | 1 | |
| 5.026701 | 1 | |
| 5.015737 | 1 | |
| 5.013557 | 1 | |
| 5.013316 | 1 | |
| 5.013075 | 1 | |
| 5.012549 | 1 | |
| 5.011693 | 1 | |
| 5.011688 | 1 | |
| 5.011569 | 1 |
| Distinct | 18671 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 73.0 KiB |
| 0101000020E6100000B81FF0C00072134050C763062A2D4A40 | 1 |
|---|---|
| 0101000020E6100000DC662AC423911340438D4292592D4A40 | 1 |
| 0101000020E6100000E8A4F78DAF9D134083DA6FED442D4A40 | 1 |
| 0101000020E61000001F2DCE18E66413400454388254304A40 | 1 |
| 0101000020E610000048C0E8F2E6601340E8C072840C304A40 | 1 |
| Other values (18666) |
Length
| Max length | 50 |
|---|---|
| Median length | 50 |
| Mean length | 50 |
| Min length | 50 |
Characters and Unicode
| Total characters | 933550 |
|---|---|
| Distinct characters | 16 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 18671 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | 0101000020E610000033FAD170CA8C13403BC5AA41982D4A40 |
|---|---|
| 2nd row | 0101000020E6100000842A357BA095134042791F4773304A40 |
| 3rd row | 0101000020E6100000A51133FB3CC613403543AA285E2B4A40 |
| 4th row | 0101000020E6100000DF180280638F134085EE92382B304A40 |
| 5th row | 0101000020E6100000CD902A8A57691340187B2FBE682F4A40 |
Common Values
| Value | Count | Frequency (%) |
| 0101000020E6100000B81FF0C00072134050C763062A2D4A40 | 1 | < 0.1% |
| 0101000020E6100000DC662AC423911340438D4292592D4A40 | 1 | < 0.1% |
| 0101000020E6100000E8A4F78DAF9D134083DA6FED442D4A40 | 1 | < 0.1% |
| 0101000020E61000001F2DCE18E66413400454388254304A40 | 1 | < 0.1% |
| 0101000020E610000048C0E8F2E6601340E8C072840C304A40 | 1 | < 0.1% |
| 0101000020E610000073840CE4D9951340A56ABB09BE2F4A40 | 1 | < 0.1% |
| 0101000020E61000007978CF81E568134016325706D5304A40 | 1 | < 0.1% |
| 0101000020E61000002766BD18CA791340FE65F7E4612F4A40 | 1 | < 0.1% |
| 0101000020E6100000F8713447567E1340F4C0C760C52F4A40 | 1 | < 0.1% |
| 0101000020E6100000E2AFC91AF5801340B0E8D66B7A2E4A40 | 1 | < 0.1% |
| Other values (18661) | 18661 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 0101000020e61000007f6c921ff1ab13404224438ead2d4a40 | 1 | < 0.1% |
| 0101000020e6100000b6476fb88f6c13409487855ad3304a40 | 1 | < 0.1% |
| 0101000020e6100000529b38b9dff11340a089b0e1e9254a40 | 1 | < 0.1% |
| 0101000020e610000041d653abafbe1340d00d4dd9e9314a40 | 1 | < 0.1% |
| 0101000020e6100000469a780778921340bda8ddaf022e4a40 | 1 | < 0.1% |
| 0101000020e61000005feb5223f4c313409badbce47f2e4a40 | 1 | < 0.1% |
| 0101000020e61000009da1b8e34d6e1340f1f09e03cb2d4a40 | 1 | < 0.1% |
| 0101000020e610000029cb10c7ba681340fd2e6ccd562e4a40 | 1 | < 0.1% |
| 0101000020e610000087c0914083ad13406bf12900c62d4a40 | 1 | < 0.1% |
| 0101000020e610000076711b0de06d1340b6813b50a72e4a40 | 1 | < 0.1% |
| Other values (18661) | 18661 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 289022 | |
| 1 | 100127 | 10.7% |
| 4 | 81005 | 8.7% |
| 2 | 57879 | 6.2% |
| 3 | 47959 | 5.1% |
| E | 47395 | 5.1% |
| 6 | 46025 | 4.9% |
| A | 44985 | 4.8% |
| D | 28459 | 3.0% |
| 8 | 28124 | 3.0% |
| Other values (6) | 162570 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 730873 | |
| Uppercase Letter | 202677 | 21.7% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 289022 | |
| 1 | 100127 | 13.7% |
| 4 | 81005 | 11.1% |
| 2 | 57879 | 7.9% |
| 3 | 47959 | 6.6% |
| 6 | 46025 | 6.3% |
| 8 | 28124 | 3.8% |
| 7 | 28002 | 3.8% |
| 9 | 27802 | 3.8% |
| 5 | 24928 | 3.4% |
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 47395 | |
| A | 44985 | |
| D | 28459 | |
| F | 28051 | |
| C | 27338 | |
| B | 26449 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 730873 | |
| Latin | 202677 | 21.7% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 289022 | |
| 1 | 100127 | 13.7% |
| 4 | 81005 | 11.1% |
| 2 | 57879 | 7.9% |
| 3 | 47959 | 6.6% |
| 6 | 46025 | 6.3% |
| 8 | 28124 | 3.8% |
| 7 | 28002 | 3.8% |
| 9 | 27802 | 3.8% |
| 5 | 24928 | 3.4% |
Latin
| Value | Count | Frequency (%) |
| E | 47395 | |
| A | 44985 | |
| D | 28459 | |
| F | 28051 | |
| C | 27338 | |
| B | 26449 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 933550 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 289022 | |
| 1 | 100127 | 10.7% |
| 4 | 81005 | 8.7% |
| 2 | 57879 | 6.2% |
| 3 | 47959 | 5.1% |
| E | 47395 | 5.1% |
| 6 | 46025 | 4.9% |
| A | 44985 | 4.8% |
| D | 28459 | 3.0% |
| 8 | 28124 | 3.0% |
| Other values (6) | 162570 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| room_id | survey_id | host_id | room_type | city | neighborhood | reviews | overall_satisfaction | accommodates | bedrooms | price | name | last_modified | latitude | longitude | location | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10176931 | 1476 | 49180562 | Shared room | Amsterdam | De Pijp / Rivierenbuurt | 7 | 4.5 | 2 | 1.0 | 156.0 | Red Light/ Canal view apartment (Shared) | 2017-07-23 13:06:27.391699 | 52.356209 | 4.887491 | 0101000020E610000033FAD170CA8C13403BC5AA41982D4A40 |
| 1 | 8935871 | 1476 | 46718394 | Shared room | Amsterdam | Centrum West | 45 | 4.5 | 4 | 1.0 | 126.0 | Sunny and Cozy Living room in quite neighbours | 2017-07-23 13:06:23.607187 | 52.378518 | 4.896120 | 0101000020E6100000842A357BA095134042791F4773304A40 |
| 2 | 14011697 | 1476 | 10346595 | Shared room | Amsterdam | Watergraafsmeer | 1 | 0.0 | 3 | 1.0 | 132.0 | Amsterdam | 2017-07-23 13:06:23.603546 | 52.338811 | 4.943592 | 0101000020E6100000A51133FB3CC613403543AA285E2B4A40 |
| 3 | 6137978 | 1476 | 8685430 | Shared room | Amsterdam | Centrum West | 7 | 5.0 | 4 | 1.0 | 121.0 | Canal boat RIDE in Amsterdam | 2017-07-23 13:06:22.689787 | 52.376319 | 4.890028 | 0101000020E6100000DF180280638F134085EE92382B304A40 |
| 4 | 18630616 | 1476 | 70191803 | Shared room | Amsterdam | De Baarsjes / Oud West | 1 | 0.0 | 2 | 1.0 | 93.0 | One room for rent in a three room appartment | 2017-07-23 13:06:19.681469 | 52.370384 | 4.852873 | 0101000020E6100000CD902A8A57691340187B2FBE682F4A40 |
| 5 | 5790170 | 1476 | 29968916 | Shared room | Amsterdam | De Pijp / Rivierenbuurt | 184 | 4.5 | 2 | 1.0 | 102.0 | Beautiful apartment | 2017-07-23 13:06:19.663975 | 52.342265 | 4.897126 | 0101000020E6100000B090B932A896134060C8EA56CF2B4A40 |
| 6 | 934060 | 1476 | 5037506 | Shared room | Amsterdam | Oostelijk Havengebied / Indische Buurt | 67 | 5.0 | 16 | 1.0 | 462.0 | LOTUS, Classic Dutch Saling Barge | 2017-07-23 13:06:09.988016 | 52.377552 | 4.930418 | 0101000020E61000005D70067FBFB813400B45BA9F53304A40 |
| 7 | 19590049 | 1476 | 132687356 | Shared room | Amsterdam | Westerpark | 2 | 0.0 | 2 | 1.0 | 414.0 | big boot Adam 04 | 2017-07-23 13:06:09.984748 | 52.375205 | 4.866117 | 0101000020E6100000DD09F65FE7761340D925AAB706304A40 |
| 8 | 5020280 | 1476 | 4059485 | Shared room | Amsterdam | Oud Oost | 2 | 0.0 | 2 | 1.0 | 222.0 | Bright modern appartment in East! | 2017-07-23 13:06:07.452609 | 52.357346 | 4.912887 | 0101000020E610000032C687D9CBA613409FAD8383BD2D4A40 |
| 9 | 15810783 | 1476 | 84978218 | Shared room | Amsterdam | Centrum West | 0 | 0.0 | 12 | 1.0 | 301.0 | CANAL BOATTOUR AMSTERDAM covered boat 1,5 hour | 2017-07-23 13:06:07.447989 | 52.386610 | 4.890128 | 0101000020E6100000FB03E5B67D8F13403D27BD6F7C314A40 |
Last rows
| room_id | survey_id | host_id | room_type | city | neighborhood | reviews | overall_satisfaction | accommodates | bedrooms | price | name | last_modified | latitude | longitude | location | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 18661 | 2763386 | 1476 | 14122005 | Private room | Amsterdam | Slotervaart | 118 | 5.0 | 2 | 1.0 | 36.0 | Comfortable SKY ROOM 12th floor | 2017-07-22 16:05:14.173175 | 52.361043 | 4.846134 | 0101000020E6100000792288F37062134091B932A8362E4A40 |
| 18662 | 19203256 | 1476 | 132265798 | Private room | Amsterdam | Bijlmer Centrum | 1 | 0.0 | 4 | 1.0 | 35.0 | NEW Stylish room, Ziggodome, AFAS LIVE, ArenA, RAI | 2017-07-22 16:05:14.168799 | 52.320049 | 4.955609 | 0101000020E6100000950D6B2A8BD213400A0F9A5DF7284A40 |
| 18663 | 19734178 | 1476 | 139135665 | Private room | Amsterdam | Osdorp | 0 | 0.0 | 1 | 0.0 | 30.0 | Cozy Apartment in Nieuw-West | 2017-07-22 16:05:14.166410 | 52.356702 | 4.792346 | 0101000020E61000003677F4BF5C2B13407A354069A82D4A40 |
| 18664 | 288967 | 1476 | 1501422 | Private room | Amsterdam | De Baarsjes / Oud West | 281 | 5.0 | 3 | 1.0 | 36.0 | BandB de Baarsjes Amsterdam | 2017-07-22 16:05:14.163973 | 52.361918 | 4.855507 | 0101000020E61000000DFFE9060A6C1340B8EA3A54532E4A40 |
| 18665 | 16685383 | 1476 | 5831960 | Private room | Amsterdam | Bos en Lommer | 5 | 5.0 | 2 | 1.0 | 30.0 | A nice bed in the attic of my 'palace'. | 2017-07-22 16:05:14.161714 | 52.379638 | 4.848829 | 0101000020E6100000E695EB6D33651340D0285DFA97304A40 |
| 18666 | 17789893 | 1476 | 47501089 | Private room | Amsterdam | Bijlmer Centrum | 10 | 5.0 | 3 | 1.0 | 32.0 | 1-3 pers. Cozy Rm AFAS Live, ArenA, ZIGGODOME | 2017-07-22 16:05:14.158963 | 52.319794 | 4.955638 | 0101000020E6100000684293C492D2134080BA8102EF284A40 |
| 18667 | 16877166 | 1476 | 67093870 | Private room | Amsterdam | Bijlmer Centrum | 6 | 5.0 | 4 | 1.0 | 24.0 | Modern Room by Arena, ZIGGO, HmH | 2017-07-22 16:05:14.151986 | 52.319080 | 4.954822 | 0101000020E61000005801BEDBBCD1134062670A9DD7284A40 |
| 18668 | 19859427 | 1476 | 29724632 | Private room | Amsterdam | Geuzenveld / Slotermeer | 0 | 0.0 | 1 | 1.0 | 38.0 | Private single room | 2017-07-22 16:05:14.149610 | 52.384028 | 4.838403 | 0101000020E61000002079E750865A1340C85F5AD427314A40 |
| 18669 | 17132164 | 1476 | 115156569 | Private room | Amsterdam | Centrum West | 13 | 4.5 | 2 | 1.0 | 36.0 | City Center studio in Touristic Amsterdam 1 | 2017-07-22 16:05:14.146183 | 52.372120 | 4.890982 | 0101000020E6100000774CDD955D9013400118CFA0A12F4A40 |
| 18670 | 7605782 | 1476 | 39503013 | Private room | Amsterdam | Centrum West | 113 | 4.5 | 2 | 1.0 | 35.0 | I have a room available for rent | 2017-07-22 16:05:12.257054 | 52.381392 | 4.899658 | 0101000020E6100000CD565EF23F9913405F7AFB73D1304A40 |